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Robust Inference for State-Space Models 
with Skewed Measurement Noise 
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Abstract —Filtering and smoothing algorithms for linear 
discrete-time state-space models with skewed and heavy-tailed 
measurement noise are presented. The algorithms use a varia¬ 
tional Bayes approximation of the posterior distribution of mod¬ 
els that have normal prior and skew-t-distrihuted measurement 
noise. The proposed filter and smoother are compared with 
conventional low-complexity alternatives in a simulated pseu¬ 
dorange positioning scenario. In the simulations the proposed 
methods achieve better accuracy than the alternative methods, 
the computational complexity of the filter being roughly 5 to 10 
times that of the Kalman filter. 

Index Terms —skew t, skewness, f-distribution, robust filtering, 
Kalman filter, RTS smoother, variational Bayes 
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Figure 1. Skewed distributions fit better than symmetric distributions to the 
time-of-flight measurement errors. BIC values for 2905 data points are 9600 
for skew t, 10500 for Student’s t, 17200 for normal, and 10000 for GM2. 


I. Introduction 

The Kalman filter (KF) [1] is the linear minimum mean- 
square-error filter for linear state-space models, but it is 
optimal within the set of all filters only when the noise 
processes are normally distributed [2]. However, the normal 
distribution has small tail probabilities, and real-world data 
typically contain large errors (“outliers”) more often than the 
normal distribution predicts [3]. Therefore, the KF is prone to 
large estimation errors when outliers occur. Hence, there is a 
need for filtering and smoothing algorithms that mitigate the 
outlier measurements’ influence. 

Many applications involve noise processes that have both 
heavy-tailed (high-kurtosis) and asymmetric (skewed) distri¬ 
butions. In radio signal based distance estimation [4], [5], for 
example, non-line-of-sight causes large positive errors [6], [7]. 
Fig. 1 shows the error histogram of a time-of-flight based ultra- 
wideband distance measurement experiment^ and maximum 
likelihood fits of some probability distribution families. By 
the Bayesian information criterion (BIC) [8], the skewed 
distributions skew t [9, Ch. 4.3] and two-component Gaussian 
mixture (GM2) model the data better than the symmetric 
Student’s t [10, Ch. 28] and normal. Other applications for 
asymmetric distributions have emerged in biostatistics [11], 
psychiatry [12], environmetrics [13], and economics [14]. 

Despite these applications, a computationally efficient es¬ 
timation algorithm for time-series data with heavy-tailed and 
asymmetric noise has been missing. Robust algorithms that 
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model the heavy-tailed noise with a f-distribution are proposed 
in [15]-[17], but these do not use the skewness information. 
A GM2 can model skewness, but the number of mixture 
components in the posterior increases exponentially with the 
number of measurements. Furthermore, the GM2 has heavy 
tails only within a limited range near the component locations, 
and it has five parameters, while four suffices for modeling 
location, spread, skewness and kurtosis. Particle Alters (PF) 
[18] can cope with a wide range of models including skewed 
noise processes, but their computational complexity increases 
rapidly as the state dimension increases. 

This letter proposes approximations to the Bayesian Alter 
and smoother that retain the computational efficiency of the 
KF while introducing more modeling flexibility for skewed 
and heavy-tailed measurement noise. The measurement noise 
is modelled by the skew f-distribution, and the proposed 
algorithms use a variational Bayes (VB) approximation of the 
posterior. The proposed Alter and smoother are evaluated by 
numerical pseudorange positioning simulations, where they 
are compared with the state-of-the-art computationally light 
algorithms and a PF. To our knowledge, the only earlier work 
applying VB approximations to the skew f-distribution is that 
of Wand et al. [19]. However, Wand et al. do not consider 
state-space models and time-series estimation. 


II. Skew f-DisTRiBUTiON 

Skewed extensions of the well-known unimodal symmetric 
distributions have been studied since the introduction of the 
skew normal distribution by Azzalini in [20]. The univariate 
skew t-distribution is parametrized by its location parameter 
fj, G M., spread parameter tr > 0, shape parameter (5 G K 
and degrees of freedom ly > 0, and has a probability density 
function (PDF) of the form 

ST(z; 6, v) = 2t(z; + cr^, i/) T(?; 0,1, + 1), 

( 1 ) 


where 




( 2 ) 
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Figure 2. The PDF ST( 2 :; 0,1, 5,4) for different shape pai'ameter values <5. 


is the PDF of Student’s t-distribution, r( ) is the gamma func¬ 


tion, and z = ' . Also, T(-; 0,1, u) 

denotes the cumulative distribution function (CDF) of Stu¬ 
dent’s f-distribution with degrees of freedom v. The PDF 
ST(z; 0,1, i5,4) is plotted for six different values of shape 
parameter 5 in Fig. 2. The skew f-distribution approaches 
normal distribution when v ^ oo and <5 —0. Expressions 
for the first two moments of the univariate skew t-distribution 
with the parametrisation (1) can be found in [21]. 

Following the introduction of the multivariate skew normal 
distribution in [22], multivariate skew t-distributions have been 
proposed in [23]-[25]. In these versions, the PDF of the 
skew t-distribution involves only the univariate CDF of t- 
distribution, while the definition of skew t-distribution given in 
[26]-[28] involves the multivariate CDF, but a single kurtosis 
factor. In this letter the measurement noise distribution is a 
product of independent univariate skew t-distributions. This 
less general model is justified in applications where one¬ 
dimensional data from different sensors can be assumed to 
be statistically independent. 


III. Problem formulation 

Consider the linear state-space model with skew-t- 

distributed measurement noise 

Xk+i= Axk+Wk, Wk'^ (3a) 

yk=Cxk + ek, [cfcji ~ ST([efe]j;0,i?*i, (3b) 

where E) denotes a (multivariate) normal PDF with 

mean /r and covariance E; A S is the state transition 

matrix; x^ G indexed by 1 < A: < IT is the state to be 
estimated with prior distribution 

p(a;i) = A/'(a;i;a;i|o,Pi|o); (4) 

where the subscript “a|6” is read “at time a using measure¬ 
ments up to time 6”; G K"® also indexed by 1 < fc < iT 
are the measurements and the elements of y^ are conditionally 
independently skew-f-distributed; H G is a diagonal 

matrix whose diagonal elements Ra are the squares of the 
spread parameters of (3b); A G is a diagonal matrix 

whose diagonal elements A^^ are the shape parameters of (3b); 
1 / g is a vector whose elements are the degrees of 
freedom of (3b); C G is the measurement matrix; 

{wk G < k < K} and {ck G K"®|1 < k < K} are 

mutually independent noise sequences; and the operator [ j^ 
gives the {i,j) entry of its argument. 

The aim of this letter is to derive a Bayesian filter and 
a Bayesian smoother using the VB method that computes 
an approximation of the filtering distribution p{xk\yi-.k) and 
smoothing distribution p{xk\yi-.K)- 


IV. Variational solution 

The likelihood function implied from (3b) has the hierar¬ 
chical representation [27] 

yk \ xk^ A/j; ~ {Cxk -f Au/j;, 77), (5a) 

(5c) 

Afc is a diagonal matrix with independent random diagonal 
elements [A^jii, and A/+(/j., E) denotes the (multivariate) 
truncated normal distribution with closed positive orthant as 
support, location parameter p, and squared-scale matrix E. 
Furthermore, Q{a,f3) denotes the gamma distribution with 
shape parameter a and rate parameter /3. 

Using Bayes’ theorem, the likelihood (5) and the prior (4), 
the joint smoothing posterior PDF can be written as 

K-l 

P{xi,K,Ui,K, ■^l-.K\yi-.K) p{XiJ^i\Xi) 

1^1 

K 

X \Yp{yk\xk,Uk,I^k)p{uk\Kk)p{Kk) (6) 

k^l 

K-l 

=A/'(xi;xi|o,Pi|o) N{xi^i\AxuQ) 

1^1 

K 

X WM{yk]Cxk + Aufc,A^^i?)A/'+(wfc;0, 

k^l 

K 

X nn(^([Afc],,;|,|). (7) 

k=l i=l 

This posterior is not analytically tractable. We seek an approx¬ 
imation in the form 


p{xi-.K,Ui:K,Ai-K\yi-.K) « qx{xi:K)qu{ui:K)qA{Ai,K)- 

( 8 ) 

In the VB approach, the Kullback-Leibler divergence 

(KLD) [29] of the tme posterior from the factorized approxi¬ 
mation is minimized; 

4, 7A = argmin 

DKL{qxixi.,K)qu{ui:K)qAiM-.K)\\p{xi.,K,Ui.,K,Ai,K\yi-.K)) 

where DKL{q{-)\\pi-)) = f log dx is the KLD. The 
analytical solutions for q^, qu and can be obtained by cyclic 
iteration of 

\ogqx{x\:K) E \d'^gp{yi-.K,Xl,K,Uv.K,A^_,K)]+Cx (9a) 

logg„(Ml:A)^ E [logp(l/l;K,a:i:if,ai:if,Al:A)] -I-C„ (9b) 

qA 

loggA(Al:A:) E [logp(j/l:A:, XIiK, M 1 :X, Al;x)] -f ca (9c) 

q^q-u. 

where the expected values on the right hand sides of (9) are 
taken with respect to the current q^, qu and qa [30, Chapter 
10] [31], [32]. Also, Cx, Cu and ca are constants with respect 
to the variables Xk, Uk and Afc, respectively. This recursion 
is convergent to a local optimum [30, Chapter 10]. When 
the iterations converge, approximate densities and qj^ are 
integrated out from the right hand side of (8) by simply 
discarding them. Then, the approximate marginal smoothing 
density qx{xk) is obtained, and it turns out to be a normal 
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Table I 

Smoothing for skew-* measurement noise 


Table II 

Filtering eor skew-* measurement noise 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 

12 : 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20 : 

21 : 

22 : 

23: 

24: 

25: 


26: 


27 

28 

29 

30 


Inputs: A, C, Q, R, A, u, x^q, Pi|o and yi-.K 
initialization 

Iny for k = !■■ ■ K 
Uk 3— 0 for k = 1 ■ ■ ■ K 

repeat 

update qx{xi:K) given and gA(Ai;x) 

for fc = 1 to K do _ 

K, ^ Pfe|fe-iCT(CPfc|fe_iCT + Afc- P)-i 
^k\k ^ ^k\k—i A kCxidJk Aii^) 

Pk\k ^ (l — 

predict qx{xk+r) 

^k + l\k ^ 

Pk + l\k + Q 

end for 

for fc = K-1 down to 1 do 

Gk ^ Pk\kA'^Pkl^\k 

^k\K ^ ^k\k A ^A;(a:fc + 1|K‘ 

Pk\K Pk\k A Gfc(Pfc + l|X “ Pk + l\k)Gj 

end for 

update qu{ui:K) and iJa(Ai:k) given qx{xr:K) 

for fc = 1 to K do 

update qu{uk) = A/'+(nfe; P^Ik) 

Uk = Vk — Cxf.\if 

iT„ ^ A(A2^+ P)-i 
f4iK ^ - p'.a)a^-' 

Uk *x- [«fe] l> see [34] for the formula 

for i = 1 to Hy do 

Pa ^ Ejv'+(„fc,^,Ufc|^)[K]?] I> see [34] for the formula 

end for 

update qA(Ak) = 0"=! & ^Al, A±I£iJii) 

^ + CP;,|kC'^) + (AP-IA + I)T 

— R~^AukU^ — AR~^UfcUk^ 

[Ak]ii 3- .. 

end for 

until converged 

Outputs: x^fd and Pk\K ft**" k = 1 ■ ■ ■ K 


distribution qx{xk) = N[xk] Xk\x: Pk\K) where the parame¬ 
ters Xk\K 4nd Pk\K 4re the output of the smoothing algorithm 
given in Table I. The filtering algorithm and the parameters 
of the filtering posterior qx(xk) = ■N'{xk\Xk\k,Pk\k) can be 
found in Table 11. The derivations for the expectations given 
in (9) are relegated to [33] because of space constraints. 


1 : 

2 : 

3: 

4: 

5: 

6 : 

7: 

8 : 

9: 

10 : 

11 : 

12 : 

13: 

14: 

15: 

16: 


17: 


18: 

19: 

20 : 

21 : 

22 : 

23: 


Inputs: A, C, Q, R, A, u, x^q, Pi|o and yi.K 
for /c = 1 to K do 
initialization 

Afc ■<— Iriy 
0 

repeat 

update qx(xk) =-^{xk', Xk\k^ Pk\k) given qu{uk) and qh{Ak) 

Kx k- Pfe|fe_iCT(CPfe,fc_iCT -F A^-'r)-i 

^k\k ^k\k-l A Kxivk — — Auk) 

Pk\k ~ ^xC)Pk\k-l 

update quiuk) = Af+lrife; Pfcij.) given qx{xk) and q^i^k) 
Ku k- A(A2-fP)-i 
tCk — Vk GXk'^k 
“fe|fc KuUk 

Uk\k ^ {I-KuA)A^-^ 

Uk <- EAr^[uk^k.Ukik)^'^k] > see [34] for the formula 

for i = 1 to riy do 

Tii <- EAr+(u;,|fc,c/;,|fc) [[Wfe]?] > see [34] for the formula 

end for 

update qA(Ak) = flSl 5 ([Afejit; ^ + T 'A+pUi^ 
given qu{uk) and qx(xk) 

mk^R-^{ukuJ + CPk\kC'^)A(AR-^A + I)r 
— R~^ AukU^ — AR~^UkU}P 

[Ak]ii <- ^■+\tk]ii 

until converged 

predict qxixk+i) 

a:fc+i|fc ^ -^Xk^k 

Pk + l\k APk\kA'^ + Q 

end for 

Outputs: Xk\k 3nd Pk\k for k = 1 ■ ■ ■ K 


^ 3 x 3 > = 4 • Isxi, and A = 5 • / 3 X 3 , where 1 is a vector of 

ones. The VB iterations of STVBF and TVBF are terminated 
when the change in the estimate is less than 0.01. 

Some statistics of the estimation error are in Table 111, 
and Fig. 3 shows an example of the error processes. Table 
III shows that the STVBF has the lowest root-mean-square 
error (RMSE), the TVBF and KF-G have negative bias, and 
the KF’s error process has the highest standard deviation and 
positive skew. As illustrated by Fig. 3, the TVBF and KF-G 
react relatively slowly to positive errors, interpreting them as 
outliers to be discounted. The KF error’s skewness is caused by 
excessive sensitivity to the large positive measurement errors. 


V. Simulations 

Numerical simulations are carried out to evaluate the 
performance of the proposed algorithms Skew-f variational 
Bayes filter (STVBF) and Skew-f variational Bayes smoother 
(STVBS). The compared filters are t variational Bayes filter 
(TVBF) [16], the bootstrap Particle filter (PF), the Kalman fil¬ 
ter (KF), and the KF with measurement validation gating (KF- 
G) [35, Ch. 5.7.2] that discards the individual measurement 
components whose normalized squared innovation is larger 
than the Xi'distribution’s 99% quantile. The smoothers are 
t variational Bayes smoother (TVBS) [16], and Rauch-Tung- 
Striebel smoother with gating (RTSS-G) [36]. KF and RTSS 
use the true mean and covariance of the measurement noise 
distribution, and the TVBF and TVBS use the true mean and 
{iz — 2)1 ir times the true covariance as the shape matrix. The 
computations are done using Matlab. 

A. One-dimensional positioning 

The simulation consists of 1000 100-step random-walks of 
model (3) with parameters A = 1, Q = 1, C = I 3 XI, R = 



Figure 3. One-dimensional positioning example illustrates TVBF estimate’s 
negative bias and KF’s sensitivity to outliers. Measurement error of 300 at 
time instant 49 is not shown. 


Table III 

Error statistics in one-dimensional positioning 


Filter 

RMSE 

Mean 

Standard deviation 

Skewness 

STVBF 

1.2 

0.1 

1.2 

0.0 

TVBF 

1.5 

-0.8 

1.3 

0.2 

KF-G 

1.5 

-0.5 

1.4 

0.1 

KF 

1.6 

0.0 

1.6 

0.5 
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(a) 5 = 2 (b) 5 = 5 


Figure 4. Convergence of the STVBF with q = 10. Ten STVBF iterations 
is enough to outperform TVER One additional VB iteration gives the same 
accuracy gain as 100 additional PF particles. 


B. Pseudorange positioning 

GNSS-type (global navigation satellite system) pseudorange 
measurements are simulated from the model 

[yk]i = ||si - [xfe]i: 3 || + [xkU + [efc]i, [efe]j ~ ST(0,1, 4) 

( 10 ) 

where Si is the ith satellite’s position, \xk]A is bias, is noise, 
and 5 is varied. The model is linearized, and the linearization 
error is negligible because the satellites are far from the 
receiver. The state model is a three-dimensional random walk 
with process noise covariance matrix Q = diag(g^,0.5^), 
where g is a parameter. The constant bias [xk\A has prior 
A/^(0,0.75^). Satellite constellations of Global Positioning 
System provided by the International GNSS service [37] are 
used, and on average 7.6 satellites are measured. The results 
are based on 1000 Monte Carlo replications of a 100-step 
trajectory. The RMSE is computed for the components [a:fe]i: 3 . 

1) Evaluation of the filter: Fig. 4 studies the convergence 
of the STVBF’s VB iteration with q = 10. The speed of 
convergence depends on the parameters of the model; the 
larger <5, the slower convergence, and large q and a high 
number of sensors can also increase the required number 
of iterations. The RMSE reduction is fastest for the first 
iterations, 10 iterations is enough to outperform TVBF, and 
after 30 iterations the RMSE reduction is negligible. Thus, 
the STVBF is slower than the TVBF that requires 5 iterations. 
In this example, one additional VB iteration gives the same 
accuracy gain as 100 additional PF particles. In the remaining 
numerical examples, STVBF’s VB iteration is terminated after 
30 iterations, and TVBF’s after 10 iterations. 

Fig. 5 shows the distributions of the RMSE differences of 
the comparison methods from the STVBF’s RMSE as percent¬ 
ages of the STVBF’s RMSE. The levels of the boxes are 5 %, 
25 %, 50 %, 75 %, and 95 % quantiles. With q > 1, the STVBF 
outperforms the comparison methods in significant majority of 
the replications. The problems with q = 0.1 are explained by 
the model structure; only sums of Xk and Uk are measured, so 
Xk and Uk are correlated a posteriori, which makes the VB 
approximation underestimate the posterior variance [30, Ch. 
10.1.2]. The STVBF works well only when the process noise 
has enough dispersion to dominate in the prior’s variance, i.e. 
when the signal-to-noise ratio (SNR) is not very low. 

2) Real-world noise: The robustness of the STVBF is eval¬ 
uated by generating the noise in Eq. (10) from the histogram 
distribution of the time-of-flight data set of Fig. 1 and using 
q = 10. The histogram of the RMSE differences of TVBF 
from the RMSE of STVBF is in Fig. 6. The proposed method 
has lower RMSE than the TVBF in 61 % of the 1000 Monte 
Carlo replications. This indicates that the proposed filter is 
robust to small deviations from the model that appear in real 
data. 


IT 10 

S 

I “ 
1-10 
S-20 

“ -30 




1 TVBF 
I KF-G 


(a) g = 0.1 


(c) q = 10 


-L -in^-^ 



ty 


2 3 4 

S 

(b) q = l 



1 



--fj 

L..j 


(d) g = 40 


Figure 5. RMSE differences per cent of the STVBF’s RMSE. The proposed 
STVBF outperforms the comparison methods with skewed measurements 
when the signal-to-noise ratio is high enough. 



RMSE difference (%) 

Figure 6. RMSE difference of TVBF per cent of the STVBF’s RMSE 
with noise generated from real time-of-flight measurements’ error histogram. 
STVBF has lower RMSE than the TVBF in 61 % of the 1000 replications. 


3) Evaluation of the smoother: The smoother versions of 
the compared algorithms are evaluated in the same simulation 
of Eq. (10) with skew-f noise. The STVBS uses 30 and the 
TVBS 10 VB iterations, which were observed to provide 
convergence. Fig. 7 shows that the STVBS outperforms the 
TVBS also at low SNR, but the percentile differences at high 
SNR are smaller than those of the corresponding filters. 



50 
£ 40 

I 

I 20 
3 10 




_ k 



^ T T T 

T 


(a) g = 1 


(b) g = 40 


Figure 7. Smoothers’ RMSE differences per cent of the STVBS’s RMSE. 
STVBS performs well also at low SNR, but difference to TVBS is smaller 
than the difference between the corresponding filters. 


VI. Conclusions 

A filter and a smoother that take into account the skewness 
and heavy-tailedness of the measurement noise are proposed. 
The algorithms use the variational Bayes approximation. In 
the presented computer simulations the proposed methods out¬ 
perform the conventional symmetric Kalman-type algorithms 
when skewness is present. The computational burden depends 
on the measurement dimension and model parameters. In the 
presented simulations the proposed filter has roughly 5 to 10 
times the Kalman filter’s computational cost. 
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